Optimal location of logistics distribution centres with swarm intelligent clustering algorithms

A clustering algorithm is a solution for grouping a set of objects and for distribution centre location problems. But the common K-means clustering algorithm may give local optimal solutions. Swarm intelligent algorithms simulate the social behaviours of animals and avoid local optimal solutions. We employ three swarm intelligent algorithms to avoid these solutions. We propose a new algorithm for the clustering problem, the fruit-fly optimization K-means algorithm (FOA K-means). We designed a distribution centre location problem and three clustering indicators to evaluate the performance of algorithms. We compare the algorithms of K-means with the ant colony optimization algorithm (ACO K-means), particle swarm optimization algorithm (PSO K-means), and fruit-fly optimization algorithm. We find K-Means modified by the fruit-fly optimization algorithm (FOA K-means) has the best performance on convergence speed and three clustering indicators, compactness, separation, and integration. Thus, we can apply FOA K-means to improve the distribution centre location solution and the efficiency for distribution in the future.


Introduction
The development of Machine Learning provides supervised learning and unsupervised learning algorithms. Supervised learning uses labelled data sets to train algorithms to classify data and predict outcomes. We can use a linear model to make an economic data forecast, like the influence of educational input on gross national income (GNI) [1]. A deep neural network is a widely used non-linear model for animal and human face recognition [2] and stock returns and options market forecast [3][4][5]. Unlike supervised learning, unsupervised learning can classify unlabelled data. The clustering algorithm is an unsupervised learning method that groups a set of unlabelled objects.
The clustering algorithm can help solve many problems, like large databases and customer segmentation. Many researchers developed clustering algorithms for various customers segmentation, like in the automobile retailer industry [6], airline customers [7], e-commerce companies [8,9]. Clustering can help forecast customer demand and improve marketing strategy for companies [10]. There have been many clustering methods. The K-means clustering algorithm is widely used in previous cases, and k-means clustering has been widely studied with various extensions in the literature and applied in many substantive areas [11][12][13][14]. However, these k-means clustering algorithms are usually initialized and need to be given some cluster priors. In general, the cluster number is unknown. In this case, the validity index can be used to find the cluster numbers where they should be independent of the clustering algorithm [15]. Many clustering effectiveness metrics for k-means clustering algorithms are already in the literature, such as Bayesian Information Criterion (BIC) [16], Akaike Information Criterion (AIC) [17], Dunn Index [18], Silhouette Width (SW) [19], Calinski and Harabasz Index (CH) [20], Gap Statistics [21], Generalized Dunn Index (DNg) [22], and modified Dunn indices (DNs) [23].so we focus on the K-means clustering algorithm in this paper [24]. The K-means clustering algorithm is efficient, but the simple design may suffer from local optimal solutions. Clustering is combinations of data points in different clusters and, thus, is a combinatorial optimization problem. In practice it has been found that metaheuristic algorithms like Genetic Algorithms (GAs) are better choice for combinatorial optimization [25]. Some researchers try the hybrid genetic algorithm to improve K-means [26,27]. We try to develop K-Means with a swarm intelligence algorithm because the swarm intelligence algorithms have advantages to find a better solution in non-convex programming problems and the swarm intelligence algorithms have advantages to find a better solution in non-convex programming problems compared with other algorithms. Particle swarm optimization (PSO) [28] and Ant Colony Optimization (ACO) [29] have good features to simulate animals behaviours to avoid local solutions. PSO is a new biological evolution algorithm. It starts from a random solution, finds the optimal solution through iteration, and evaluates the quality of the optimal solution through fitness. It is simpler than the genetic algorithm, and there is no "crossover" and " mutation " compared with genetic algorithm, which finds the global optimum by searching optimum value. Each in the ACO algorithm can change the surrounding environment by releasing pheromones, and each can perceive the real-time changes in the surrounding environment, and the individuals communicate indirectly. At the same time, the distributed computing method is adopted in the search process, and multiple individuals perform parallel computing, which greatly improves the computing ability and operating efficiency. We applied the clustering algorithm modified by PSO [30] and ACO [31]. Additionally, we develop a new swarm intelligence algorithm, Fruit fly Optimization (FOA), to the clustering algorithm. Fruit fly Optimization has been proven to find a satisfactory solution and good convergence speed [32][33][34]. By simulating the predation process of fruit flies using their keen sense of smell and vision, FOA realizes a group iterative search. The principle of FOA is easy to understand, simple to operate, easy to implement, and has strong local search ability. The three algorithms probability search method is not easy to fall into the local optimum, and it is easy to find the global optimum solution. Given the advantage of K-Means and FOA, we can obtain a good algorithm for the clustering problem.
Logistics distribution centres are intensive and specialized, serving a set or region of customers. The logistics distribution centres store, sort, distribute, and deliver goods based on customer requirements. Selecting algorithms to achieve lower costs will help companies work efficiently and create more value for society [35]. There are many positive cases about location, such as e-commerce in a city [36], the clothing industry [37], and the locations for relief distribution and victim evacuation after a disaster [38]. Clustering and location problems have similar objectives, so we use a clustering algorithm to help cluster customers and locate the distribution centres.
This paper is organized as follows: Section 2 Methodology shows the details of common K-Means clustering, ACO, PSO, and FOA K-means. Section 3 Results and discussion conducts empirical analysis and describes the superior performance of FOA K-means. Section 4 presents the conclusions.

Methodology
The K-means algorithm is an efficient and widely used algorithm for clustering. The simple design suffers from local optimal solutions in many cases. But the swarm intelligence algorithms can find better solutions in non-convex programming problems. Three swarm intelligence algorithms are employed to solve the location problem for distribution problems. The ant colony optimization algorithm, particle swarm optimization algorithm, and fruit-fly optimization algorithm are introduced as follows.

K-Means algorithm
K-Means clustering is a fast, robust, and simple algorithm with an effective convergence property that gives results when data sets are distinctly separated from each other. It performs well when the number of cluster centres is specified by a well-defined list of types shown in the data. The process of common K-Means is: Step 1: Randomly select K objects from N objects as the cluster centre, where K is the required target classification number.
Step 2: Calculate the distance of all remaining objects to each cluster centre. Use the Euclidean distance to assign objects to the same categories and nearest cluster centre.
Step 3: Determine the cluster centres based on the mean properties of each object in the same categories.
Step 4: Repeat steps (2) and (3) until the new cluster centre is equal to or less than the target threshold and the clustering algorithm ends.

Ant colony optimization algorithm
The ant colony optimization algorithm (ACO) is a probabilistic algorithm for route optimization and clustering analysis. This application performs well for transportation and telecommunication networks [31]. The ant colony algorithm was proposed by Marco Dorigo in 1992 and shows how ants find their way to food. The main characteristics of this algorithm include distributed computing, heuristic search, and feedback. It is an effective algorithm to find a satisfactory solution and avoid falling into local solutions. The clustering algorithm modified by the ant colony algorithm appears in Fig 1. The process is as follows: Step 1: Initialize the ant colony parameters. Set the maximum number of iterations to 1000.
Set the number of cluster samples, attributes, and cluster centres. Set the pheromone evaporation rate rho = 0.1, which will update the pheromone matrix. Set the number R of ants (the solutions) to 100.
Step 2: Initialize the pheromone matrix. The pheromone matrix is a matrix of N � K (sample number � cluster number). Set all pheromone initial values to 0.01.
Step 3: All ants construct a solution set according to the pheromone matrix tau N×K . The higher tau i,k , object i is assigned to categories k and centre k.

Swarm intelligent clustering algorithms
Step 4: Generate the vector r 1×N for each ant. If the corresponding value r i is less than the set pheromone threshold q = 0.9, the path takes the maximum pheromone. If there are multiple maximum values, randomly take one of them from the same maximum value as the path. If the pheromone is greater than the set threshold q = 0.9, the ratio of each path pheromone to the total pheromone of the sample is obtained. The initial path is determined by probabilities based on the pheromone proportion of each path. In this step, each ant may have a different solution.
Step 5: Determine the new centres based on the mean attributes of objects in the same categories.
Step 6: The deviation error F ij is the Euclidean distances of each object to the cluster centre j. The smaller the F ij is, the better the clustering effect is. Calculate the sum D i of deviation smallest error F ij for each sample i to the nearest cluster centre j in each ant and find the best ant solutions D best .
Step 7: Update the pheromone value according to the best L solutions (L = 2) in the last step.
The pheromone matrix update is as follows: tau i,j in bestsolution is updated according to the best solution.
Step 8: Determine whether the maximum number of iterations is reached. If not reached, return step (3); if reached, the optimal cluster solution is output.

Particle swarm optimization algorithm
Kennedy and Eberhart first introduced the particle swarm optimization algorithm (PSO). PSO is effective and highly efficient as well [39]. This algorithm simulates the social behaviours of birds and is a good optimization algorithm with relatively low iteration numbers. A study finds that adding environmental factors can improve the performance of PSO in searching strategy, but the question is how to select the environmental factor more effectively [40].
In the PSO algorithm, each particle can remember the best solution that it has searched for. P id is the best position that the entire particle swarm has experienced. P gd is the best solution in the current search. Each particle has a speed, V represents the velocity of the i-th particle in the d-th dimension, w is the inertia weight, η 1 , η 2 are the importance parameters for P id and P gd , and rand() is the random function that generates a floating variable in the range [0,1]. Calculate: Then the updated position of particle is: The process of the PSO K-Means Algorithm is as follows: Step 1: Initialize the particles and randomly assign each object to a category. Determine cluster centres based on the mean of the objects in the same category. Initialize the velocity of the particles, iterations, and group size.
Step 2: For each particle, calculate the fitness value, and compare it to the recorded best fitness value of P id . If there is one better than P id , update P id ; Step 3: Compare each particle fitness value with the best position in the current group P gd . If there is a better one, update P gd ; Step 4: Determine the velocity and position of the particles based on Formula (1) and Formula (2).
Step 5: Apply the K-means optimization. Update the cluster centre, apply the nearest neighbour rule, and update the particle.
Step 6: If the end condition is reached (maximum iterations M = 200 in this case), the process ends. Otherwise, return to step (2).

Fruit fly optimization algorithm
The fruit fly optimization algorithm (FOA) was proposed by Professor Pan (2011) and has been widely applied in solving programming problems. These problems include logistics storage selection [22] and a mutual fund forecasting model [41], which optimize the parameters of a general regression neural network in the forecasting model and obtain a better solution than PSO. The fruit-fly optimization algorithm simulates the fruit-fly behaviours of seeking food and contributes a swarm intelligence optimization algorithm. The process of the standard fruit fly algorithm is as follows: Step 1: Initialize the population size Sizepop, the number of iterations Maxgen, and the random location of the fruit fly.
Step 2: Each fruit fly randomly searches for food in all directions.
Step 3: Calculate the distance Dist i from each fruit fly to the origin and set the smell value S i .
Dist i ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Step 4: Substitute the obtained smell value to the fitness function and determine the best fruit fly.
Step 5: Identify the best-performing fruit flies in the fruit fly population.
Step 6: Record the optimal taste density value and the specific X, Y coordinates.
Step 7: Start the iteration, repeat (2) through (6), and if the current iteration time is less than maximum iterations Maxgen, then perform step (2). Otherwise, end the calculation.
The Basic FOA will be reconstructed to make it more suitable for solving clustering problems based on the K-Means algorithm.

K-Means modified by FOA
In the previous result, we can see some flaws while we are using the other algorithms. ACO avoids falling into a local optimal solution, but the convergence speed is low., On the other hand, PSO updates the decisive variables well, but compared with K-Means, it does not show its advantage of being a swarm intelligent algorithm. So, we want to develop an algorithm to combine the characteristics of good convergence speed of K-Means and FOA and Modified FOA. The K-Means algorithm is the new algorithm to satisfy our demands.
We develop the modified FOA K-Means algorithm as follows: Step 1: Design the fruit fly structure for clustering. Each fruit fly is a cluster centre. There are many objects, assigned to N categories. There are N centres, and each centre has k attributes. X i , Y i represent the coordinate vectors of the i-th fruit fly. The size of the fruit-fly population is sizepop. When initializing, x i , y i are randomly generated, and rands(1, N×k) represents a random number of (0,1), which produces one row and N×k columns. rcc is the max attribute value. In the following case, we set it to 6000. D i is the distance from each fruit fly to the origin.
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Step 2: The objects closest to the same cluster centre are categorized into the same class. Sum up the Euclidean distances from the cluster centre to each object. Apply the K-means method to generate a new cluster centre CentreBest with the shortest total distance of all solutions.
Step 3: Determine the adjust vector Z i,modi Step 4: Determine the variation from best fruit fly. Compare the original best fruit fly with other flies with mutations. The method of mutation is as follows.
Step 5: Determine whether max iterations is reached. If not, return to step 2. Otherwise, output the result. We set 200 iterations.

Results and discussion
To evaluate the performance of swarm intelligence algorithms, we conduct a calculation test case.

Location problem
In a city, retailers need to send goods to many customers. They must build distribution centres to serve all customers. The locations of customers appear in Fig 2. We need to determine the number of distribution centres. Within a group, the sum of squared error (SSE) is a common method to determine the optimal number of clusters. In Fig  3, the optimal number of clusters is three. With more categories or clusters, there is a shorter distance within the group's sum of squared error. We focus on the change of slope. The decline from one to three categories is rapid and then decreases slowly, so to reduce the construction cost, the optimal number of clusters is three. To serve the needs of the retailers, three logistics distribution centres should be built.   The distribution centres result appears in Table 1.

K-Means modified by ACO.
The ant colony algorithm is used to solve the problem. The initial parameters are as follows. The iteration is 200, the ant number is 200, the sample number is 300, and the cluster number is 3. After running ACO, we find that the ant colony clustering algorithm is unsuitable for this problem. The convergence is poor with more iterations. The result appears in Fig 5. The distribution centres appear in Table 2:

K-Means modified by PSO.
We set initial parameters: the iteration is 200, the particle size is 200, the sample number is 300, and the cluster number is 3. The result appears in Fig 6. The distribution centres are shown in Table 3:

K-Means modified by FOA.
In the previous result, we saw some flaws in the other algorithms. ACO avoids falling into local optimal solution, and PSO updates the decision variables but does not show its advantage of a swarm intelligent algorithm compared with  The distribution centres appear in Table 4: 3.2.5 Comparison of results. In Table 5, Total Distance is a compactness indicator, so smaller is better. Total Centre Distance is the separation indicator, so higher is better. Davies-Bouldin index is an integrated indicator, so smaller is better.
ACO is not suitable for cluster analysis in large clustering samples in the study due to the poor performance of three indicators. Total Distance is the highest, 280379.72. The total centre distance is the lowest, 4149.43. Davies-Bouldin Index is the highest, 4.53. Compared with Common K-Means, MFOA K-Means, and PSO K-Means, the convergence speed of ACO is obviously slower in the computational experiment.
The PSO K-Means and FOA K-Means algorithms can prevent the Common K-Means from falling into a local optimum as well. In this case, PSO K-Means does not perform better than common K-Means due to the higher compactness indicator and higher DBI. PSO K-Means is more likely to fall into a local optimal solution than FOA. The performance of FOA K-Means is better than that of common K-Means and has the best performance in three indicators. FOA K-Means has the smallest DBI, 1.846.
Through the results in Fig 9, we can also draw further conclusions: In the results, we can see that when the FOA K-Means algorithm reached a local optimal state, it still could optimize the solution. If we run more tests on common K-Means, it might generate a worse solution than the previous result. FOA K-Means has a more stable  performance. Another cluster intelligent algorithm, the particle swarm K-Means algorithm, compared to the modified fruit fly optimization K-Means algorithm, has little advantage in this problem. The comparison figure of the algorithms appears in Fig 8 and

PLOS ONE
Swarm intelligent clustering algorithms

Conclusions
We solved the location problem with swarm intelligent clustering algorithms (including the ant colony optimization algorithm, the fruit fly optimization K-Means algorithm, and the particle swarm optimization algorithm) and the common K-Means algorithm. We proposed a new swarm intelligent algorithm, the fruit fly optimization K-Means algorithm (FOA Kmeans). We compared the indicators of algorithm results. The FOA K-Means algorithm has the best performance in the DBI indicator, which has great significance for clustering problems in further studies. FOA K-means has the best performance on convergence speed and three clustering indicators, compactness, separation, and integration. ACO is not suitable for cluster analysis of the large clustering samples in the study due to the poor performance in three indicators. PSO K-Means is easier to fall into a local optimal solution than FOA. Therefore, FOA K-means is the best solution for distribution centre location problems and clustering.
The study has some deficiencies. To analyse the clustering problem further, some actual constraints are neglected in location problems. First, the location of the distribution centres is obtained using the mean of the coordinates. Second, the relative importance of different customers is not considered. Different customers have varios needs, which can affect the location of the distribution centres. The need for excellent customer service becomes important. And any business' ability to understand each segment of its customers' needs will give it a greater advantage in providing targeted customer service and developing customed centres. Future research can explore addressing these problems further.